Skip to content

feat(stats): Add 1% low FPS tracking#2661

Open
githubawn wants to merge 9 commits into
TheSuperHackers:mainfrom
githubawn:feature/fps-1percent-low-hud
Open

feat(stats): Add 1% low FPS tracking#2661
githubawn wants to merge 9 commits into
TheSuperHackers:mainfrom
githubawn:feature/fps-1percent-low-hud

Conversation

@githubawn
Copy link
Copy Markdown

@githubawn githubawn commented Apr 29, 2026

This PR adds a 1% low FPS metric to the existing FPS counter HUD, displayed in parentheses next to the average FPS. The 1% low is a standard performance metric used to surface frame time spikes that the average FPS hides. Inspired by #1942.

Added 1% low FPS display to HUD counter
Added m_renderFpsLowString and supporting UI members
Added RenderFpsLowColor configuration to InGameUI INI
Increased history to 5,000 time-bounded frames
Implemented rolling 0.5s window for average FPS
Implemented rolling 3.0s window for 1% lows

The following screenshot from AOD Cobalt Rush shows the 1% low FPS overlay compared to CapFrameX (an external benchmarking tool, centered right), demonstrating the value of surfacing this metric separately from the average.

lowfps

This change was generated with AI assistance. All generated code has been reviewed, tested, and verified for correctness. The implementation went through multiple iterations, including fundamental changes to the underlying approach, as well as passes to apply simplifications, fix inconsistencies, and optimize performance. Both Generals and GeneralsMD implementations are included in this PR with identical code.

@greptile-apps
Copy link
Copy Markdown

greptile-apps Bot commented Apr 29, 2026

Greptile Summary

This PR adds a 1% low FPS metric to the in-game HUD, displayed next to the existing average FPS counter. The implementation replaces the old 30-frame static ring buffer with a 4096-entry time-bounded history supporting a 1.0 s average window and a 3 s window for nth_element-based 1% low calculation.

  • A new getLow1PercentFPS() virtual is added to the Display interface and implemented in W3DDisplay for both Generals and GeneralsMD; a new RenderFpsLowColor INI field allows per-configuration color tuning.
  • The FPS limit HUD label is updated to show [X] (Generals) or \u25b2X (GeneralsMD) for the uncapped state instead of [0].
  • The ring-buffer logic and nth_element math are correct, but a static local timer (lastLowUpdate) was not promoted to a member variable like all other statics removed in this refactor, and the Generals vs GeneralsMD format strings/colors diverge despite the PR claiming identical implementations.

Confidence Score: 5/5

Safe to merge; changes are limited to the FPS overlay and do not touch game logic or data paths.

The ring-buffer implementation and nth_element 1% low calculation are correct. All findings are non-blocking quality improvements: a lingering static local that the rest of the refactor already addressed, stale doc comments, and a formatting question between the two game variants.

Both W3DDisplay.cpp files for the static timer cleanup, and GeneralsMD/InGameUI.cpp to clarify whether the Unicode arrow format is intentional.

Important Files Changed

Filename Overview
Generals/Code/GameEngineDevice/Source/W3DDevice/GameClient/W3DDisplay.cpp Core FPS metric refactor: replaces 30-frame static history with a 4096-entry ring buffer; contains a static local timer not migrated to a member variable like the rest of the statics.
GeneralsMD/Code/GameEngineDevice/Source/W3DDevice/GameClient/W3DDisplay.cpp Mirror of Generals W3DDisplay changes; same static local timer issue and single-line if style violations.
Generals/Code/GameEngineDevice/Include/W3DDevice/GameClient/W3DDisplay.h Adds ring buffer member arrays; has a stale 0.5s comment and spurious extra semicolon in the enum declaration.
GeneralsMD/Code/GameEngineDevice/Include/W3DDevice/GameClient/W3DDisplay.h Same as Generals header; same stale comment and double-semicolon typo.
Generals/Code/GameEngine/Source/GameClient/InGameUI.cpp Adds m_renderFpsLowString, INI color config, and draw calls; refactors FPS limit display to show [X] for uncapped. Clean implementation.
GeneralsMD/Code/GameEngine/Source/GameClient/InGameUI.cpp Analogous to Generals InGameUI but uses Unicode arrows and a different default color, diverging from Generals despite the PR claiming identical code.
Core/GameEngine/Include/GameClient/Display.h Adds pure virtual getLow1PercentFPS() to the Display interface; straightforward and correct.
Generals/Code/Tools/GUIEdit/Include/GUIEditDisplay.h Stub getLow1PercentFPS() returning 0 for the GUI editor display, consistent with other stubs.

Sequence Diagram

sequenceDiagram
    participant Draw as W3DDisplay::draw()
    participant Metrics as updatePerformanceMetrics()
    participant Sample as addFpsSample()
    participant Avg as calculateAverageFPS(1.0s)
    participant Low as calculateLow1PercentFPS(3.0s)
    participant UI as InGameUI::drawRenderFps()
    participant Disp as Display interface

    Draw->>Metrics: called every frame
    Metrics->>Sample: elapsedSeconds
    Sample->>Sample: write fpsHistory and durationHistory at offset
    Sample-->>Metrics: ring buffer updated
    Metrics->>Avg: "windowSeconds=1.0"
    Avg-->>Metrics: m_averageFPS
    Metrics->>Low: every 100ms via timeGetTime()
    Low->>Low: nth_element on sortBuffer bottom 1%
    Low-->>Metrics: m_low1PercentFPS
    UI->>Disp: getAverageFPS()
    Disp-->>UI: m_averageFPS
    UI->>Disp: getLow1PercentFPS()
    Disp-->>UI: m_low1PercentFPS
    UI->>UI: draw e.g. 60 (52) [120]
Loading
Prompt To Fix All With AI
Fix the following 6 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 6
Generals/Code/GameEngineDevice/Source/W3DDevice/GameClient/W3DDisplay.cpp:1069-1075
The `lastLowUpdate` timer was left as a `static` local while the rest of the old statics (`lastUpdateTime64`, `historyOffset`, `fpsHistory`) were all migrated to member variables in this very refactor. If `W3DDisplay` is torn down and re-created (display mode switch, windowed/fullscreen toggle, etc.) the static retains its previous value, meaning the new instance's `m_low1PercentFPS` will stay at its constructor-initialized value until the 100 ms interval elapses from that leftover timestamp. Promoting it to a member variable completes the refactor consistently.

```suggestion
	UnsignedInt now = timeGetTime();
	if (now - m_lastLowUpdate >= 100) // update low 1% metrics at 100ms intervals instead of 1000ms since it is now extremely cheap
	{
		m_lastLowUpdate = now;
		m_low1PercentFPS = calculateLow1PercentFPS(3.0f);
	}
```

### Issue 2 of 6
GeneralsMD/Code/GameEngineDevice/Source/W3DDevice/GameClient/W3DDisplay.cpp:1120-1126
Same static-local timer issue as in the Generals copy — `lastLowUpdate` should be a member variable `m_lastLowUpdate` to be consistent with the rest of the refactoring and to support clean re-initialization.

```suggestion
	UnsignedInt now = timeGetTime();
	if (now - m_lastLowUpdate >= 100) // update low 1% metrics at 100ms intervals instead of 1000ms since it is now extremely cheap
	{
		m_lastLowUpdate = now;
		m_low1PercentFPS = calculateLow1PercentFPS(3.0f);
	}
```

### Issue 3 of 6
Generals/Code/GameEngineDevice/Include/W3DDevice/GameClient/W3DDisplay.h:179-183
The `m_averageFPS` doc comment says "0.5s" but `updatePerformanceMetrics()` calls `calculateAverageFPS(1.0f)`. Also there is a spurious extra semicolon after the enum declaration (`}; ;`).

```suggestion
	Real m_averageFPS;		///< average fps over the last 1.0s.
	Real m_low1PercentFPS;	///< 1% low fps.
	Real m_currentFPS;		///< current fps value.

	enum { FPS_HISTORY_SIZE = 4096 }; // degrades gracefully beyond this size
```

### Issue 4 of 6
GeneralsMD/Code/GameEngineDevice/Include/W3DDevice/GameClient/W3DDisplay.h:179-183
Same stale "0.5s" doc comment and double-semicolon typo as in the Generals header.

```suggestion
	Real m_averageFPS;		///< average fps over the last 1.0s.
	Real m_low1PercentFPS;	///< 1% low fps.
	Real m_currentFPS;		///< current fps value.

	enum { FPS_HISTORY_SIZE = 4096 }; // degrades gracefully beyond this size
```

### Issue 5 of 6
Generals/Code/GameEngineDevice/Source/W3DDevice/GameClient/W3DDisplay.cpp:999-1001
These single-line `if` bodies violate the project rule requiring the statement to be on a separate line from the condition. The same pattern appears in `calculateLow1PercentFPS` at line 1022, and both occurrences are also present in the GeneralsMD copy.

```suggestion
		if (idx < 0)
			idx += FPS_HISTORY_SIZE;
		timeSum += m_durationHistory[idx];
		samples++;
```

### Issue 6 of 6
GeneralsMD/Code/GameEngine/Source/GameClient/InGameUI.cpp:6047-6063
**Generals vs GeneralsMD display format divergence**

The PR description states "Both Generals and GeneralsMD implementations are included in this PR with identical code," but the two are visibly different. GeneralsMD formats the 1% low as `▼N` (`\x25BC%u`) and the FPS cap as `▲N`/`▲X`, while Generals uses `(%u)` and `[N]`/`[X]`. The default `m_renderFpsLowColor` also differs — `(180, 170, 120, 255)` in Generals vs `(190, 180, 130, 255)` in GeneralsMD. Are the Unicode arrows intentional for Zero Hour only, or is one of these meant to mirror the other?

Reviews (8): Last reviewed commit: "Buffer Footprint: Changed FPS history si..." | Re-trigger Greptile

Comment thread Generals/Code/GameEngineDevice/Include/W3DDevice/GameClient/W3DDisplay.h Outdated
Comment thread Generals/Code/GameEngineDevice/Source/W3DDevice/GameClient/W3DDisplay.cpp Outdated
Comment thread Generals/Code/GameEngineDevice/Source/W3DDevice/GameClient/W3DDisplay.cpp Outdated
Comment thread Generals/Code/GameEngineDevice/Source/W3DDevice/GameClient/W3DDisplay.cpp Outdated
Comment thread Generals/Code/GameEngineDevice/Source/W3DDevice/GameClient/W3DDisplay.cpp Outdated
Comment thread Generals/Code/GameEngineDevice/Source/W3DDevice/GameClient/W3DDisplay.cpp Outdated
Comment thread Generals/Code/GameEngineDevice/Source/W3DDevice/GameClient/W3DDisplay.cpp Outdated
Comment thread Generals/Code/GameEngineDevice/Source/W3DDevice/GameClient/W3DDisplay.cpp Outdated
Comment thread Generals/Code/GameEngineDevice/Source/W3DDevice/GameClient/W3DDisplay.cpp Outdated
Comment thread Generals/Code/GameEngineDevice/Source/W3DDevice/GameClient/W3DDisplay.cpp Outdated
githubawn added 4 commits May 1, 2026 01:13
Move FPS history state into W3DDisplay members.
Implement accurate time-based windowing for frame metrics.
Use ceiling logic for improved 1% low accuracy.
Optimize percentile calculation using efficient selection algorithm.
Rename and centralize performance update call sites.
Increase history buffer for stable high-FPS monitoring.
Update average FPS math to use time-weighted mean.
Move sortBuffer to class members for consistency.
@githubawn githubawn requested a review from Skyaero42 April 30, 2026 23:53
Comment thread Generals/Code/GameEngineDevice/Source/W3DDevice/GameClient/W3DDisplay.cpp Outdated
Real m_low1PercentFPS; ///<1% low fps.
Real m_currentFPS; ///<current fps value.

enum { FPS_HISTORY_SIZE = 5000 }; // covers 5s at 1000 FPS, degrades gracefully beyond
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this number needs evaluation but I would like input from others.

As it stands, 5000 samples in three Real arrays require 60 kB in memory - just for the FPS trackers.
I think the question should be: given 3 seconds timeframe for the 1% low FPS, At what 3-second average FPS is the low fps no longer relevant. Say you average 300 fps, what is the chance that the 1% is so low that it is still relevant as a performance metric. If 300 fps is the upper bound, only 900 samples need to be stored. That's only 18% of the memory needed compared to the current setting.

Copy link
Copy Markdown
Author

@githubawn githubawn May 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Say you average 300 fps, what is the chance that the 1% is so low that it is still relevant as a performance metric.

It's pretty common to see 300 average fps and 30 fps lows in this game in large skirmish matches even on vs2022 non-retail.

But definitely would like to hear multiple inputs for this number.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation approach is fine, but the data size requirements for the fps counters are outrageous.

I inquired a bit with Chat Gippy and it would be possible to reduce size requirements by quantizing samples and moving intermediate samples into timed buckets.

Here is a sample FrameBucket with avg and min frame time, stored in 2 bytes each, with a resolution of 16 micro seconds.

// Compact frametime statistics bucket.
//
// Stores average and maximum frametime values using 16-bit integers
// with fixed-point quantization.
//
// ------------------------------------------------------------------
// Encoding
// ------------------------------------------------------------------
//
// Values are stored in units of 16 microseconds:
//
//     stored_value = frametime_us / 16
//
// This allows a uint16_t to represent:
//
//     65535 * 16 us = 1,048,560 us
//                   = ~1048 ms
//
// which comfortably covers extremely slow frames (~1 FPS).
//
// ------------------------------------------------------------------
// Precision
// ------------------------------------------------------------------
//
// Quantization step:
//
//     16 us = 0.016 ms
//
// Maximum quantization error:
//
//     +/- 8 us
//
// This is effectively negligible for FPS telemetry.
//
// Example:
//
//     16.667 ms frame (60 FPS)
//
// Encoded as:
//
//     16667 us / 16 = 1042
//
// Decoded:
//
//     1042 * 16 = 16672 us
//
// Error:
//
//     +5 us = 0.005 ms
//
// ------------------------------------------------------------------
// Why fixed-point quantization?
// ------------------------------------------------------------------
//
// Advantages:
//
// - extremely compact (4 bytes total per bucket)
// - deterministic runtime cost
// - cache friendly
// - no floating-point storage
// - sufficient precision for telemetry
// - supports >1 second frametimes
//
// ------------------------------------------------------------------
// Intended usage
// ------------------------------------------------------------------
//
// Typical bucket generation:
//
//     avgFrameUs = accumulatedFrameUs / frameCount
//     maxFrameUs = worstFrameUsSeen
//
// Then:
//
//     bucket.SetAvgUs(avgFrameUs);
//     bucket.SetMaxUs(maxFrameUs);
//
// ------------------------------------------------------------------
// Recommended usage pattern
// ------------------------------------------------------------------
//
// Build buckets over fixed time windows:
//
//     e.g. 50 ms or 100 ms
//
// rather than fixed frame counts.
//
// This keeps statistical resolution stable across varying FPS.
//
// ------------------------------------------------------------------

struct FrameBucket
{
    uint16_t avg16us;
    uint16_t max16us;

    static constexpr uint32_t QUANTUM_US    = 16;
    static constexpr uint32_t QUANTUM_SHIFT = 4;

    static constexpr uint32_t MAX_US =
        0xFFFFu * QUANTUM_US;

    // Encode microseconds into 16 us fixed-point units.
    //
    // Rounds to nearest unit.
    //
    // Input is clamped to representable range.
    static uint16_t EncodeUs(uint32_t us)
    {
        if (us > MAX_US)
            us = MAX_US;

        return static_cast<uint16_t>(
            (us + (QUANTUM_US / 2)) >> QUANTUM_SHIFT);
    }

    // Decode fixed-point units back into microseconds.
    static uint32_t DecodeUs(uint16_t v)
    {
        return static_cast<uint32_t>(v)
            << QUANTUM_SHIFT;
    }

    void SetAvgUs(uint32_t us)
    {
        avg16us = EncodeUs(us);
    }

    void SetMaxUs(uint32_t us)
    {
        max16us = EncodeUs(us);
    }

    uint32_t GetAvgUs() const
    {
        return DecodeUs(avg16us);
    }

    uint32_t GetMaxUs() const
    {
        return DecodeUs(max16us);
    }

    float GetAvgMs() const
    {
        return static_cast<float>(GetAvgUs()) * 0.001f;
    }

    float GetMaxMs() const
    {
        return static_cast<float>(GetMaxUs()) * 0.001f;
    }

    float GetAvgFPS() const
    {
        const uint32_t us = GetAvgUs();

        return (us > 0)
            ? (1000000.0f / static_cast<float>(us))
            : 0.0f;
    }

    float GetMinFPS() const
    {
        const uint32_t us = GetMaxUs();

        return (us > 0)
            ? (1000000.0f / static_cast<float>(us))
            : 0.0f;
    }
};

And then we move these FrameBuckets into a time based array.

Sample implementation:

// ============================================================================
// Rolling 3-second FPS statistics
// ============================================================================
//
// Stores:
//
//     3 seconds @ 10 ms resolution
//
// = 300 buckets
//
// Each bucket summarizes:
//
//     - average frametime
//     - worst frametime
//
// over a 10 ms interval.
//
// ============================================================================

class FpsHistory
{
public:

    static constexpr uint32_t BUCKET_INTERVAL_US = 10000; // 10 ms
    static constexpr uint32_t HISTORY_SECONDS    = 3;

    static constexpr uint32_t BUCKET_COUNT =
        (HISTORY_SECONDS * 1000000) / BUCKET_INTERVAL_US;

    // ------------------------------------------------------------------------

    void Reset()
    {
        m_writeIndex = 0;
        m_bucketCount = 0;

        m_accumulatedUs = 0;
        m_accumulatedFrames = 0;
        m_maxFrameUs = 0;
        m_bucketElapsedUs = 0;

        std::fill(
            std::begin(m_buckets),
            std::end(m_buckets),
            FrameBucket{});
    }

    // ------------------------------------------------------------------------
    // Add a frame
    //
    // frameUs:
    //     frametime in microseconds
    //
    // Example:
    //
    //     16.667 ms = 16667 us
    //
    // ------------------------------------------------------------------------

    void AddFrame(uint32_t frameUs)
    {
        // Accumulate stats for current bucket.

        m_accumulatedUs += frameUs;
        m_accumulatedFrames++;

        if (frameUs > m_maxFrameUs)
            m_maxFrameUs = frameUs;

        m_bucketElapsedUs += frameUs;

        // Emit one or more buckets if enough time elapsed.

        while (m_bucketElapsedUs >= BUCKET_INTERVAL_US)
        {
            EmitBucket();

            m_bucketElapsedUs -= BUCKET_INTERVAL_US;
        }
    }

    // ------------------------------------------------------------------------

    float GetAverageFPS() const
    {
        if (m_bucketCount == 0)
            return 0.0f;

        uint64_t totalUs = 0;

        for (uint32_t i = 0; i < m_bucketCount; ++i)
        {
            totalUs += m_buckets[i].GetAvgUs();
        }

        const float avgUs =
            static_cast<float>(totalUs)
            / static_cast<float>(m_bucketCount);

        return avgUs > 0.0f
            ? (1000000.0f / avgUs)
            : 0.0f;
    }

    // ------------------------------------------------------------------------
    // Approximate 1% low FPS
    //
    // Uses the worst bucket frametimes.
    //
    // This is intentionally approximate and designed for:
    //
    // - low memory use
    // - deterministic runtime
    // - good stutter detection
    //
    // ------------------------------------------------------------------------

    float GetOnePercentLowFPS() const
    {
        if (m_bucketCount == 0)
            return 0.0f;

        uint32_t worstUs = 0;

        // Find worst bucket maximum frametime.

        for (uint32_t i = 0; i < m_bucketCount; ++i)
        {
            worstUs = std::max(
                worstUs,
                m_buckets[i].GetMaxUs());
        }

        return worstUs > 0
            ? (1000000.0f / static_cast<float>(worstUs))
            : 0.0f;
    }

private:

    // ------------------------------------------------------------------------

    void EmitBucket()
    {
        FrameBucket& bucket =
            m_buckets[m_writeIndex];

        // Compute average frametime for bucket.

        const uint32_t avgUs =
            (m_accumulatedFrames > 0)
            ? static_cast<uint32_t>(
                m_accumulatedUs / m_accumulatedFrames)
            : 0;

        bucket.SetAvgUs(avgUs);
        bucket.SetMaxUs(m_maxFrameUs);

        // Advance ring buffer.

        m_writeIndex =
            (m_writeIndex + 1) % BUCKET_COUNT;

        if (m_bucketCount < BUCKET_COUNT)
            ++m_bucketCount;

        // Reset accumulators.

        m_accumulatedUs = 0;
        m_accumulatedFrames = 0;
        m_maxFrameUs = 0;
    }

private:

    FrameBucket m_buckets[BUCKET_COUNT];

    uint32_t m_writeIndex = 0;
    uint32_t m_bucketCount = 0;

    uint64_t m_accumulatedUs = 0;
    uint32_t m_accumulatedFrames = 0;
    uint32_t m_maxFrameUs = 0;

    uint32_t m_bucketElapsedUs = 0;
};

// ============================================================================
// Example usage
// ============================================================================

int main()
{
    FpsHistory history;

    history.Reset();

    // Simulate ~60 FPS.

    for (int i = 0; i < 500; ++i)
    {
        uint32_t frameUs = 16667;

        // Simulate occasional stutter.

        if ((i % 120) == 0)
        {
            frameUs = 50000; // 50 ms hitch
        }

        history.AddFrame(frameUs);
    }

    printf(
        "Average FPS: %.2f\n",
        history.GetAverageFPS());

    printf(
        "Approx 1%% Low FPS: %.2f\n",
        history.GetOnePercentLowFPS());

    return 0;
}

This data organization approach uses just 1200 bytes at most. Bucket intervals could also be longer for even less size.

It loses value accuracy. It optimizes for speed over accuracy.

However, your current implemention would perhaps be more efficient at low frame rates, because less values need reading at small frame counts. I do like that aspect, because low fps runtimes need to do less.

I suggest to look into this more how to find a good balance between size,speed and accuracy. There are many options here.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did some optimizations into this PR.
Switched to 4096 for now, 48kb is still very luxurious for a framerate counter, but the current implementation is very easy on the cpu.

@xezon
Copy link
Copy Markdown

xezon commented May 5, 2026

I do not like the visuals of the new value. Can this look better?

@xezon xezon added Enhancement Is new feature or request GUI For graphical user interface Minor Severity: Minor < Major < Critical < Blocker Gen Relates to Generals ZH Relates to Zero Hour labels May 7, 2026
@xezon
Copy link
Copy Markdown

xezon commented May 7, 2026

Suggestion:

image

Triangle down code is: \x25BC
Triangle up code is: \x25B2

Uncapped FPS:

image

X is ascii X

X because 0 makes no sense as a value for no cap.

@xezon
Copy link
Copy Markdown

xezon commented May 7, 2026

Maybe make the 1% low value a bit brighter. It can be difficult to read in game.

Comment thread GeneralsMD/Code/GameEngine/Source/GameClient/InGameUI.cpp Outdated
Comment thread GeneralsMD/Code/GameEngineDevice/Source/W3DDevice/GameClient/W3DDisplay.cpp Outdated
Comment thread GeneralsMD/Code/GameEngineDevice/Source/W3DDevice/GameClient/W3DDisplay.cpp Outdated
Comment thread GeneralsMD/Code/GameEngineDevice/Source/W3DDevice/GameClient/W3DDisplay.cpp Outdated
Comment thread GeneralsMD/Code/GameEngineDevice/Source/W3DDevice/GameClient/W3DDisplay.cpp Outdated
Comment thread GeneralsMD/Code/GameEngineDevice/Source/W3DDevice/GameClient/W3DDisplay.cpp Outdated
Comment thread GeneralsMD/Code/GameEngineDevice/Source/W3DDevice/GameClient/W3DDisplay.cpp Outdated
Real m_low1PercentFPS; ///<1% low fps.
Real m_currentFPS; ///<current fps value.

enum { FPS_HISTORY_SIZE = 5000 }; // covers 5s at 1000 FPS, degrades gracefully beyond
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation approach is fine, but the data size requirements for the fps counters are outrageous.

I inquired a bit with Chat Gippy and it would be possible to reduce size requirements by quantizing samples and moving intermediate samples into timed buckets.

Here is a sample FrameBucket with avg and min frame time, stored in 2 bytes each, with a resolution of 16 micro seconds.

// Compact frametime statistics bucket.
//
// Stores average and maximum frametime values using 16-bit integers
// with fixed-point quantization.
//
// ------------------------------------------------------------------
// Encoding
// ------------------------------------------------------------------
//
// Values are stored in units of 16 microseconds:
//
//     stored_value = frametime_us / 16
//
// This allows a uint16_t to represent:
//
//     65535 * 16 us = 1,048,560 us
//                   = ~1048 ms
//
// which comfortably covers extremely slow frames (~1 FPS).
//
// ------------------------------------------------------------------
// Precision
// ------------------------------------------------------------------
//
// Quantization step:
//
//     16 us = 0.016 ms
//
// Maximum quantization error:
//
//     +/- 8 us
//
// This is effectively negligible for FPS telemetry.
//
// Example:
//
//     16.667 ms frame (60 FPS)
//
// Encoded as:
//
//     16667 us / 16 = 1042
//
// Decoded:
//
//     1042 * 16 = 16672 us
//
// Error:
//
//     +5 us = 0.005 ms
//
// ------------------------------------------------------------------
// Why fixed-point quantization?
// ------------------------------------------------------------------
//
// Advantages:
//
// - extremely compact (4 bytes total per bucket)
// - deterministic runtime cost
// - cache friendly
// - no floating-point storage
// - sufficient precision for telemetry
// - supports >1 second frametimes
//
// ------------------------------------------------------------------
// Intended usage
// ------------------------------------------------------------------
//
// Typical bucket generation:
//
//     avgFrameUs = accumulatedFrameUs / frameCount
//     maxFrameUs = worstFrameUsSeen
//
// Then:
//
//     bucket.SetAvgUs(avgFrameUs);
//     bucket.SetMaxUs(maxFrameUs);
//
// ------------------------------------------------------------------
// Recommended usage pattern
// ------------------------------------------------------------------
//
// Build buckets over fixed time windows:
//
//     e.g. 50 ms or 100 ms
//
// rather than fixed frame counts.
//
// This keeps statistical resolution stable across varying FPS.
//
// ------------------------------------------------------------------

struct FrameBucket
{
    uint16_t avg16us;
    uint16_t max16us;

    static constexpr uint32_t QUANTUM_US    = 16;
    static constexpr uint32_t QUANTUM_SHIFT = 4;

    static constexpr uint32_t MAX_US =
        0xFFFFu * QUANTUM_US;

    // Encode microseconds into 16 us fixed-point units.
    //
    // Rounds to nearest unit.
    //
    // Input is clamped to representable range.
    static uint16_t EncodeUs(uint32_t us)
    {
        if (us > MAX_US)
            us = MAX_US;

        return static_cast<uint16_t>(
            (us + (QUANTUM_US / 2)) >> QUANTUM_SHIFT);
    }

    // Decode fixed-point units back into microseconds.
    static uint32_t DecodeUs(uint16_t v)
    {
        return static_cast<uint32_t>(v)
            << QUANTUM_SHIFT;
    }

    void SetAvgUs(uint32_t us)
    {
        avg16us = EncodeUs(us);
    }

    void SetMaxUs(uint32_t us)
    {
        max16us = EncodeUs(us);
    }

    uint32_t GetAvgUs() const
    {
        return DecodeUs(avg16us);
    }

    uint32_t GetMaxUs() const
    {
        return DecodeUs(max16us);
    }

    float GetAvgMs() const
    {
        return static_cast<float>(GetAvgUs()) * 0.001f;
    }

    float GetMaxMs() const
    {
        return static_cast<float>(GetMaxUs()) * 0.001f;
    }

    float GetAvgFPS() const
    {
        const uint32_t us = GetAvgUs();

        return (us > 0)
            ? (1000000.0f / static_cast<float>(us))
            : 0.0f;
    }

    float GetMinFPS() const
    {
        const uint32_t us = GetMaxUs();

        return (us > 0)
            ? (1000000.0f / static_cast<float>(us))
            : 0.0f;
    }
};

And then we move these FrameBuckets into a time based array.

Sample implementation:

// ============================================================================
// Rolling 3-second FPS statistics
// ============================================================================
//
// Stores:
//
//     3 seconds @ 10 ms resolution
//
// = 300 buckets
//
// Each bucket summarizes:
//
//     - average frametime
//     - worst frametime
//
// over a 10 ms interval.
//
// ============================================================================

class FpsHistory
{
public:

    static constexpr uint32_t BUCKET_INTERVAL_US = 10000; // 10 ms
    static constexpr uint32_t HISTORY_SECONDS    = 3;

    static constexpr uint32_t BUCKET_COUNT =
        (HISTORY_SECONDS * 1000000) / BUCKET_INTERVAL_US;

    // ------------------------------------------------------------------------

    void Reset()
    {
        m_writeIndex = 0;
        m_bucketCount = 0;

        m_accumulatedUs = 0;
        m_accumulatedFrames = 0;
        m_maxFrameUs = 0;
        m_bucketElapsedUs = 0;

        std::fill(
            std::begin(m_buckets),
            std::end(m_buckets),
            FrameBucket{});
    }

    // ------------------------------------------------------------------------
    // Add a frame
    //
    // frameUs:
    //     frametime in microseconds
    //
    // Example:
    //
    //     16.667 ms = 16667 us
    //
    // ------------------------------------------------------------------------

    void AddFrame(uint32_t frameUs)
    {
        // Accumulate stats for current bucket.

        m_accumulatedUs += frameUs;
        m_accumulatedFrames++;

        if (frameUs > m_maxFrameUs)
            m_maxFrameUs = frameUs;

        m_bucketElapsedUs += frameUs;

        // Emit one or more buckets if enough time elapsed.

        while (m_bucketElapsedUs >= BUCKET_INTERVAL_US)
        {
            EmitBucket();

            m_bucketElapsedUs -= BUCKET_INTERVAL_US;
        }
    }

    // ------------------------------------------------------------------------

    float GetAverageFPS() const
    {
        if (m_bucketCount == 0)
            return 0.0f;

        uint64_t totalUs = 0;

        for (uint32_t i = 0; i < m_bucketCount; ++i)
        {
            totalUs += m_buckets[i].GetAvgUs();
        }

        const float avgUs =
            static_cast<float>(totalUs)
            / static_cast<float>(m_bucketCount);

        return avgUs > 0.0f
            ? (1000000.0f / avgUs)
            : 0.0f;
    }

    // ------------------------------------------------------------------------
    // Approximate 1% low FPS
    //
    // Uses the worst bucket frametimes.
    //
    // This is intentionally approximate and designed for:
    //
    // - low memory use
    // - deterministic runtime
    // - good stutter detection
    //
    // ------------------------------------------------------------------------

    float GetOnePercentLowFPS() const
    {
        if (m_bucketCount == 0)
            return 0.0f;

        uint32_t worstUs = 0;

        // Find worst bucket maximum frametime.

        for (uint32_t i = 0; i < m_bucketCount; ++i)
        {
            worstUs = std::max(
                worstUs,
                m_buckets[i].GetMaxUs());
        }

        return worstUs > 0
            ? (1000000.0f / static_cast<float>(worstUs))
            : 0.0f;
    }

private:

    // ------------------------------------------------------------------------

    void EmitBucket()
    {
        FrameBucket& bucket =
            m_buckets[m_writeIndex];

        // Compute average frametime for bucket.

        const uint32_t avgUs =
            (m_accumulatedFrames > 0)
            ? static_cast<uint32_t>(
                m_accumulatedUs / m_accumulatedFrames)
            : 0;

        bucket.SetAvgUs(avgUs);
        bucket.SetMaxUs(m_maxFrameUs);

        // Advance ring buffer.

        m_writeIndex =
            (m_writeIndex + 1) % BUCKET_COUNT;

        if (m_bucketCount < BUCKET_COUNT)
            ++m_bucketCount;

        // Reset accumulators.

        m_accumulatedUs = 0;
        m_accumulatedFrames = 0;
        m_maxFrameUs = 0;
    }

private:

    FrameBucket m_buckets[BUCKET_COUNT];

    uint32_t m_writeIndex = 0;
    uint32_t m_bucketCount = 0;

    uint64_t m_accumulatedUs = 0;
    uint32_t m_accumulatedFrames = 0;
    uint32_t m_maxFrameUs = 0;

    uint32_t m_bucketElapsedUs = 0;
};

// ============================================================================
// Example usage
// ============================================================================

int main()
{
    FpsHistory history;

    history.Reset();

    // Simulate ~60 FPS.

    for (int i = 0; i < 500; ++i)
    {
        uint32_t frameUs = 16667;

        // Simulate occasional stutter.

        if ((i % 120) == 0)
        {
            frameUs = 50000; // 50 ms hitch
        }

        history.AddFrame(frameUs);
    }

    printf(
        "Average FPS: %.2f\n",
        history.GetAverageFPS());

    printf(
        "Approx 1%% Low FPS: %.2f\n",
        history.GetOnePercentLowFPS());

    return 0;
}

This data organization approach uses just 1200 bytes at most. Bucket intervals could also be longer for even less size.

It loses value accuracy. It optimizes for speed over accuracy.

However, your current implemention would perhaps be more efficient at low frame rates, because less values need reading at small frame counts. I do like that aspect, because low fps runtimes need to do less.

I suggest to look into this more how to find a good balance between size,speed and accuracy. There are many options here.

…ootprint.

Fast Indexing: Replaced slow modulo division with fast bitwise index masking.
Loop Optimization: Eliminated per-iteration modulo checks from history search loops.
Safe Sampling: Clamped minimum frame time to prevent division by zero errors.
Removed Branching: Pre-primed timer in display init to remove redundant per-frame branches.
Cleaned Capping: Used RenderFpsPreset constant instead of magic zero value for uncapped.
Improved Windows: Widened average window and boosted low-percent telemetry update rates.
void W3DDisplay::addFpsSample(Real elapsedSeconds)
{
if (elapsedSeconds <= 0.0f)
if (elapsedSeconds < 0.0001f)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this clamp necessary? In what event is the frame 0 seconds long?

for (Int i = 0; i < m_historyCount; ++i)
{
Int idx = (m_historyOffset - 1 - i + FPS_HISTORY_SIZE) % FPS_HISTORY_SIZE;
if (idx < 0) idx += FPS_HISTORY_SIZE;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing line break

Multiple times.

static UnsignedInt lastLowUpdate = 0;
UnsignedInt now = timeGetTime();
if (now - lastLowUpdate >= 1000)
if (now - lastLowUpdate >= 100) // update low 1% metrics at 100ms intervals instead of 1000ms since it is now extremely cheap
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does "is now extremely cheap" mean? If it was extremely cheap, it would be updated every frame, but it is updated 10 times a second.

for (Int i = 0; i < m_historyCount; ++i)
{
Int idx = (m_historyOffset - 1 - i + FPS_HISTORY_SIZE) % FPS_HISTORY_SIZE;
if (idx < 0) idx += FPS_HISTORY_SIZE;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this branch much better than the arithmetic from before?

if (now - lastLowUpdate >= 100) // update low 1% metrics at 100ms intervals instead of 1000ms since it is now extremely cheap
{
lastLowUpdate = now;
m_low1PercentFPS = calculateLow1PercentFPS(3.0f);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about instead of updating every 100 ms, lazily update it when getLow1PercentFPS is called. Then it will be updated just once a second tops exactly when needed.

m_historyCount = 1;
m_lastUpdateTime64 = 0;
m_currentFPS = 30.0f;
for (Int h = 0; h < FPS_HISTORY_SIZE; ++h)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can also use std::fill one after the other.

Real m_currentFPS; ///<current fps value.

enum { FPS_HISTORY_SIZE = 5000 }; // covers 5s at 1000 FPS, degrades gracefully beyond
enum { FPS_HISTORY_SIZE = 4096 }; ; // degrades gracefully beyond this size
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 semicolons

Int idx = (m_historyOffset - 1 - i + FPS_HISTORY_SIZE) % FPS_HISTORY_SIZE;
if (idx < 0) idx += FPS_HISTORY_SIZE;
timeSum += m_durationHistory[idx];
m_sortBuffer[sampleCount++] = m_fpsHistory[idx];
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe the sort buffer can be on the stack instead? Should be a lot cheaper.


m_currentFPS = 1.0f / elapsedSeconds;
m_fpsHistory[m_historyOffset] = m_currentFPS;
m_durationHistory[m_historyOffset] = elapsedSeconds;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we store both, fps and frametime, when they can be inferred from each other?

enum { FPS_HISTORY_SIZE = 5000 }; // covers 5s at 1000 FPS, degrades gracefully beyond
enum { FPS_HISTORY_SIZE = 4096 }; ; // degrades gracefully beyond this size
Real m_fpsHistory[FPS_HISTORY_SIZE];
Real m_durationHistory[FPS_HISTORY_SIZE];
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try store 2 byte unsigned values for quantized microseconds (use 16 microseconds precision).

Together with the other suggestions, the member data size would then shrink from 49 kb to 8 kb.

Quick-bench shows quantized integer will be 6 to 7 times faster than the float version (gcc, clang, O3).

https://quick-bench.com/q/CnVv3pcyDt3r1ikoGCELHiWgzqM

#include <array>
#include <cstdint>
#include <random>

static void FloatArrayAccumulate(benchmark::State& state) {
  std::array<float, 4096> values;

  std::mt19937 rng(12345);
  std::uniform_real_distribution<float> dist(0.0f, 1.0f);

  // Fill array with random floats
  for (float& v : values) {
    v = dist(rng);
  }

  for (auto _ : state) {
    // Accumulate values
    float sum = 0.0f;
    for (float v : values) {
      sum += v;
    }

    benchmark::DoNotOptimize(sum);
  }
}
BENCHMARK(FloatArrayAccumulate);

static void UShortArrayAccumulateMul16(benchmark::State& state) {
  std::array<unsigned short, 4096> values;

  std::mt19937 rng(12345);
  std::uniform_int_distribution<unsigned short> dist(0, 65535);

  // Fill array with random unsigned shorts
  for (unsigned short& v : values) {
    v = dist(rng);
  }

  static constexpr uint32_t QUANTUM_SHIFT = 4; // represents 16 microseconds

  for (auto _ : state) {
    // Accumulate values shifted by 4
    uint64_t sum = 0;
    for (unsigned short v : values) {
      sum += static_cast<uint64_t>(v) << QUANTUM_SHIFT;
    }

    benchmark::DoNotOptimize(sum);
  }
}
BENCHMARK(UShortArrayAccumulateMul16);

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Enhancement Is new feature or request Gen Relates to Generals GUI For graphical user interface Minor Severity: Minor < Major < Critical < Blocker ZH Relates to Zero Hour

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants